## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
read_excel("dataset/dataset-variable-description.xlsx") |>
DT::datatable()Insert title here
BCon 147: special topics
1 Project overiew
In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.
2 Scenario
Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.
Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.
3 Understanding the source of data
The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.
This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.
4 Data wrangling
4.1 Data importation
Import the two dataset
Employee.csvandPerformanceRating.csv. Save theEmployee.csvasemployee_dtaandPerformanceRating.csvasperf_rating_dta.Merge the two dataset using the
left_joinfunction fromdplyr. Use theEmployeeIDvariable as the varible to join by.Save the merged dataset as
hr_perf_dta.
## import the two dataset
employee_dta <- read_csv("dataset/Employee.csv")
perf_rating_dta <- read_csv("dataset/PerformanceRating.csv")
## merge employee_dta and perf_rating_dta
hr_perf_dta <-
employee_dta |>
left_join(perf_rating_dta, by = "EmployeeID")
## Use the datatable from DT package to display the merged dataset
DT::datatable(hr_perf_dta)4.2 Data management
Using the
clean_namesfunction fromjanitorpackage, standardize the variable names by using the recommended naming of variables.Save the renamed variables as
hr_perf_dtato update the dataset.
## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <-
hr_perf_dta |>
clean_names()Create a new variable
cat_educationwhereineducationis1=No formal education;2=High school;3=Bachelor;4=Masters;5=Doctorate. Use thecase_whenfunction to accomplish this task.Similarly, create new variables
cat_envi_sat,cat_job_sat, andcat_relation_satforenvironment_satisfaction,job_satisfaction, andrelationship_satisfaction, respectively. Re-code the values accordingly as1=Very dissatisfied;2=Dissatisfied;3=Neutral;4=Satisfied; and5=Very satisfied.Create new variables
cat_work_life_balance,cat_self_rating,cat_manager_ratingforwork_life_balance,self_rating, andmanager_rating, respectively. Re-code accordingly as1=Unacceptable;2=Needs improvement;3=Meets expectation;4=Exceeds expectation; and5=Above and beyond.Save all the changes in the
hr_perf_dta.
## create cat_education
## create cat_envi_sat, cat_job_sat, and cat_relation_sat
## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
## print the updated hr_perf_dta using datatable function5 Exploratory data analysis
5.1 Descriptive statistics of employee attrition
Select the variables
attrition,job_role,department,age,salary,job_satisfaction, andwork_life_balance.Save asattrition_key_var_dta.Compute and plot the attrition rate across
job_role,department, andage,salary,job_satisfaction, andwork_life_balance.Attrition rate across job_role has been done for you! You have the freedom to customize your plot accordingly. Show your creativity!
## calculating and plotting attrition rate
hr_perf_dta |>
group_by(job_role) |>
count(attrition) |>
mutate(pct_attrition = n / sum(n)) |>
ungroup() |>
mutate(job_role = reorder_within(job_role, pct_attrition, attrition)) |>
ggplot(aes(pct_attrition, job_role, fill = attrition)) +
geom_col(position = "dodge", width = 0.8) +
scale_y_reordered() +
facet_wrap(~ attrition, scales = "free_y", ncol = 1) +
labs(x = "Attrition rate",
y = "Job role")